home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Internet Info 1993
/
Internet Info CD-ROM (Walnut Creek) (1993).iso
/
inet
/
internet-drafts
/
draft-ietf-avt-encodings-00.txt
< prev
next >
Wrap
Text File
|
1993-03-03
|
13KB
|
338 lines
Internet Engineering Task Force Audio-Video Transport Working Group
INTERNET-DRAFT H. Schulzrinne
AT&T Bell Laboratories
December 15, 1992
Expires: 5/1/93
Media Encodings
Status of this Memo
This document is an Internet Draft. Internet Drafts are working documents
of the Internet Engineering Task Force (IETF), its Areas, and its Working
Groups. Note that other groups may also distribute working documents as
Internet Drafts).
Internet Drafts are draft documents valid for a maximum of six months.
Internet Drafts may be updated, replaced, or obsoleted by other documents
at any time. It is not appropriate to use Internet Drafts as reference
material or to cite them other than as a "working draft" or "work in
progress."
Please check the I-D abstract listing contained in each Internet Draft
directory to learn the current status of this or any other Internet Draft.
Distribution of this document is unlimited.
Abstract
This document describes a possible structure of the media content
for audio and video for Internet applications. The definitions
are independent of the particular transport mechanism used. The
descriptions provide pointers to reference implementations and
the detailed standards. This document is meant as an aid
for implementors of audio, video and other real-time multimedia
applications.
INTERNET-DRAFT Media December 15, 1992
1 Audio
1.1 Encoding-independent recommendations
The following recommendations are default operating parameters. An
applications should be prepared to handle other values. The ranges
given are meant to give guidance to application writers, allowing
a set of applications conforming to these guidelines to interoperate
without additional negotiation. These guidelines are not intended to
restrict operating parameters for application that can negotiate a set of
interoperable parameters, e.g., through a conference control protocol.
For packetized audio, the default packetization interval should have a
duration of 20 ms, unless otherwise noted in Table 1. For frame-based
encodings (marked as F in the table 1 below) such as LPC, CELP and GSM, the
sender may choose to combine several frame intervals into a single message
to reduce header overhead. The number of frames is single packetization
interval, however, a sender may choose to combine several intervals into a
single message. The receiver can tell the number of frames contained in a
message since the nominal frame duration is defined as part of the encoding.
If multiple channels are used, the left channel information always precedes
the right-channel information. For more than two channels, the convention
followed by the AIFF-C audio interchange format should be followed. It
is listed in the table below. (The AIFF-C specification is available by
anonymous ftp at sgi.sgi.com in the file sgi/aiff-c.9.26.91.ps.)
type_______channels________________________________________________________________
stereo left right
3 channel left right center
quad front left front right rear left rear right
4 channel left center right surround
6 channel left left center center right right center surround
The sampling frequency should be drawn from the set: 8, 11.025, 16, 22.05,
44.1 and 48 kHz. Preferred rates are 8, 16 and 44.1 kHz.
1.2 Recommended Audio Encodings
The table 1 shows the names, types (sample vs. frame oriented) and default
sampling frequencies of recommended encodings. The list is partially
drawn from the document ``Recommended practices for enhancing digital
audio compatibility in multimedia systems'', published by the Interactive
Multimedia Assocation, Version 3.00, Oct. 1992 (referenced as [IMA]).
H. Schulzrinne Expires 5/1/93 [Page 2]
INTERNET-DRAFT Media December 15, 1992
The names are for identification only; they correspond to the names used
within the Real-Time Transport Protocol (RTP). Other applications may choose
different namings.
name nom. sampling rate type frame description
__________________kHz___kb/s__S/F___ms___________________________________
L16 44.1 705.6 S 16-bit linear, 2's complement
G722 16 64 S CCITT subband ADPCM
PCMU 8 64 S CCITT mu-law PCM
PCMA 8 64 S CCITT A-law PCM
G721 8 32 S CCITT ADPCM
DVI 8 32 S Intel/DVI ADPCM [IMA]
G723 8 24 S CCITT ADPCM
GSM 8 13 B 20 RTE/LTP GSM 06.10)
_1016_______________8____4.8__B_____30_____CELP__________________________
Table 1: Audio encodings
For multi-octet encodings, octets are transmitted in network byte order
(i.e., most significant octet first).
A detailed description of the encodings is given below:
L16 denotes uncompressed audio data, using 16-bit signed representation
with 65535 equally divided steps between minimum and maximum signal
level, ranging from -32768 to 32767. The value is represented in two's
complement notation.
PCMU is specified in CCITT recommendation G.711. Audio data is encoded
as eight bits per sample, after companding. Code to convert between
linear and mu-law companded data is available in the IMA document.
PCMA is specified in CCITT recommendation G.711. Audio data is encoded
as eight bits per sample, after companding. Code to convert between
linear and A-law companded data is available in the IMA document.
G721 through G729 are specified in the corresponding CCITT recommendations.
Reference implementations for G.721 and G.723 are available as part of
the CCITT Software Tool Library (STL) from the ITU General Secretariat,
Sales Service, Place du Nations, CH-1211 Geneve 20, Switzerland. The
library is covered by a license and is available for anonymous ftp on
gaia.cs.umass.edu, file pub/ccitt/ccitt_tools.tar.Z.
GSM (group speciale mobile) denotes the European GSM 06.10 provisional
standard for full-rate speech transcoding, prI-ETS 300 036, which
is based on RPE/LTP (residual pulse excitation/long term prediction)
coding at a rate of 13 kb/s. A reference implementation was written by
Carsten Borman and Jutta Degener (TU Berlin, Germany) and is available
H. Schulzrinne Expires 5/1/93 [Page 3]
INTERNET-DRAFT Media December 15, 1992
for anonymous ftp from tub.cs.tu-berlin.de, directory tub/tubmik.
1016 uses code-excited linear prediction (CELP) and is specified in Federal
Standard FED-STD 1016, published by the Office of Technology and
Standards, Washington, DC 20305-2010.
The U. S. DoD's Federal-Standard-1016 based 4800 bps code excited
linear prediction voice coder version 3.2 (CELP 3.2) Fortran and
C simulation source codes are available for worldwide distribution
at no charge (on DOS diskettes, but configured to compile on Sun
SPARC stations) from: Bob Fenichel, National Communications System,
Washington, D.C. 20305, phone +1-703-692-2124, fax +1-703-746-4960.
Example input and processed speech files, a technical information
bulletin, and the official standard ``Federal Standard 1016, Telecom-
munications: Analog to Digital Conversion of Radio Voice by 4,800
bit/second Code Excited Linear Prediction (CELP)'' are included at no
charge. According to Vincent Cate (Carnegie Mellon), the distribution
is also available for anonymous ftp at furmint.nectar.cs.cmu.edu
(128.2.209.111) in directory celp.audio.compression.
The following articles describes the Federal-Standard-1016 4.8-kbps
CELP coder:
Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch, ``The
Proposed Federal Standard 1016 4800 bps Voice Coder: CELP,'' Speech
Technology Magazine, April/May 1990, p. 58-64.
Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch, ``The
Federal Standard 1016 4800 bps CELP Voice Coder,'' Digital Signal
Processing, Academic Press, 1991, Vol. 1, No. 3, p. 145-155.
Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch, ``The
DoD 4.8 kbps Standard (Proposed Federal Standard 1016),'' in Advances
in Speech Coding, ed. Atal, Cuperman and Gersho, Kluwer Academic
Publishers, 1991, Chapter 12, p. 121-133.
Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch, ``The
Proposed Federal Standard 1016 4800 bps Voice Coder: CELP,'' Speech
Technology Magazine, April/May 1990, p. 58-64.
Copies of the FS-1016 document are available for $2.50 each from:
GSA Rm 6654
7th & D St SW
Washington, D.C. 20407
1-202-708-9205
DVI/ADPCM is specified in the ``Recommended Practices for Enhancing
H. Schulzrinne Expires 5/1/93 [Page 4]
INTERNET-DRAFT Media December 15, 1992
Digital Audio Compatibility in Multimedia Systems'', published by the
Interactive Multimedia Association (IMA), Annapolis, MD. The document
also contains reference implementations for mu-law to 16-bit, ADPCM and
sample rate conversions.
For sample-based encodings, a receiver should accept packets representing
between 0 and 200 ms of audio data.(1) Receivers should be prepared to
accept multi-channel audio, but may choose to only play a single channel.
1.3 API for Codecs
The application programming interface described here is suggested, but not
required for interoperability. The API shown here is compatible with the
one used by SunOS 4.1.
#define AUDIO_ENCODING_ULAW (1) /* ISDN u-law */
#define AUDIO_ENCODING_ALAW (2) /* ISDN A-law */
#define AUDIO_ENCODING_LINEAR (3) /* PCM 2's-complement (0-center) */
typedef struct {
unsigned sample_rate; /* samples per second */
unsigned samples_per_unit; /* samples per unit */
unsigned bytes_per_unit; /* bytes per sample unit */
unsigned channels; /* # of interleaved channels */
unsigned encoding; /* data encoding format */
unsigned data_size; /* length of data (optional) */
} audio_hdr_t;
void *x_init(void *state, double period);
int x_encode(void *in_buf, int in_size, audio_hdr_t *ah,
void *out_buf, int *out_size, void *state);
int x_decode(void *in_buf, int in_size, audio_hdr_t *ah,
void *out_buf, int *out_size, void *state)
x_init initializes a particular instance of a codec. If the argument state
is zero, a memory area sufficient to hold the encoder or decoder state is
allocated; if that argument is non-zero, the existing area is reinitialized.
The function returns a pointer to the area, zero if the state area could not
be allocated. The argument period refers to the amount of audio data in
each block. It is typically only used for block-oriented codecs.
------------------------------
1. This restriction allows reasonable buffer sizing for the receiver.
H. Schulzrinne Expires 5/1/93 [Page 5]
INTERNET-DRAFT Media December 15, 1992
The generic pointer to state refers to an area of storage whose structure is
opaque to the application program. In the functions, 'x' is replaced by the
appropriate codec name, appropriately modified to conform to C syntax (e.g.,
g711, g721, etc).
The encoder and decoder transform the data contained in the input buffer
in_buf (in_size bytes) and deposit the result into the output buffer area
out_buf. The variable out_size is set to the number of bytes actually
contained in the output buffer. The ah arguments points to a structure of
type audio_hdr_t, which defines the given input data format for the encoder
and the desired output data format for the decoder. The functions return 0
on success, a negative number if a failure occurred.
All block-oriented audio codecs should be able to encode and decode several
consecutive blocks.
2 Video
For further study. (To contain definitions and pointers of H.261, etc.).
Items to include: H.261 frame format, request for retransmission.
3 Address of Author
Henning Schulzrinne
AT&T Bell Laboratories
MH 2A244
600 Mountain Avenue
Murray Hill, NJ 07974
telephone: 908 582-2262
electronic mail: hgs@research.att.com
H. Schulzrinne Expires 5/1/93 [Page 6]